WikiWalk: Random walks on Wikipedia for Semantic Relatedness

نویسندگان

  • Eric Yeh
  • Daniel Ramage
  • Christopher D. Manning
  • Eneko Agirre
  • Aitor Soroa
چکیده

Computing semantic relatedness of natural language texts is a key component of tasks such as information retrieval and summarization, and often depends on knowledge from a broad range of real-world concepts and relationships. We address this knowledge integration issue with a method of computing semantic relatedness using personalized PageRank (random walks) on a graph derived from Wikipedia. This paper evaluates methods for building the graph, including link selection strategies, and two methods for representing input texts as distributions over the graph nodes: one based on a dictionary lookup, the other based on Explicit Semantic Analysis. We evaluate our techniques on standard word relatedness and text similarity datasets, finding that they capture similarity information complementary to existing Wikipedia-based relatedness measures, resulting in small improvements over a state of the art measure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Semantics from Random Walks on Wikipedia: Comparing Learning and Counting Methods

Semantic relatedness between words has been extracted from a variety of sources. In this ongoing work, we explore and compare several options for determining if semantic relatedness can be extracted from navigation structures in Wikipedia. In that direction, we first investigate the potential of representation learning techniques such as DeepWalk in comparison to previously applied methods base...

متن کامل

Exploring the use of word embeddings and random walks on Wikipedia for the CogAlex shared task

In our participation on the task we wanted to test three different kinds of relatedness algorithms: one based on embeddings induced from corpora, another based on random walks on WordNet and a last one based on random walks based on Wikipedia. All three of them perform similarly in noun relatedness datasets like WordSim353, close to the highest reported values. Although the task definition gave...

متن کامل

A Random Graph Walk based Approach to Computing Semantic Relatedness Using Knowledge from Wikipedia

Determining semantic relatedness between words or concepts is a fundamental process to many Natural Language Processing applications. Approaches for this task typically make use of knowledge resources such as WordNet and Wikipedia. However, these approaches only make use of limited number of features extracted from these resources, without investigating the usefulness of combining various diffe...

متن کامل

Advertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles

When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...

متن کامل

WikiRelate! Computing Semantic Relatedness Using Wikipedia

Wikipedia provides a knowledge base for computing word relatedness in a more structured fashion than a search engine and with more coverage than WordNet. In this work we present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009